How bandwidth selection algorithms impact exploratory data analysis using kernel density estimation.

نویسندگان

  • Jared K Harpole
  • Carol M Woods
  • Thomas L Rodebaugh
  • Cheri A Levinson
  • Eric J Lenze
چکیده

Exploratory data analysis (EDA) can reveal important features of underlying distributions, and these features often have an impact on inferences and conclusions drawn from data. Graphical analysis is central to EDA, and graphical representations of distributions often benefit from smoothing. A viable method of estimating and graphing the underlying density in EDA is kernel density estimation (KDE). This article provides an introduction to KDE and examines alternative methods for specifying the smoothing bandwidth in terms of their ability to recover the true density. We also illustrate the comparison and use of KDE methods with 2 empirical examples. Simulations were carried out in which we compared 8 bandwidth selection methods (Sheather-Jones plug-in [SJDP], normal rule of thumb, Silverman's rule of thumb, least squares cross-validation, biased cross-validation, and 3 adaptive kernel estimators) using 5 true density shapes (standard normal, positively skewed, bimodal, skewed bimodal, and standard lognormal) and 9 sample sizes (15, 25, 50, 75, 100, 250, 500, 1,000, 2,000). Results indicate that, overall, SJDP outperformed all methods. However, for smaller sample sizes (25 to 100) either biased cross-validation or Silverman's rule of thumb was recommended, and for larger sample sizes the adaptive kernel estimator with SJDP was recommended. Information is provided about implementing the recommendations in the R computing language.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast optimal bandwidth selection for kernel density estimation

We propose a computationally efficient 2−exact approximation algorithm for univariate Gaussian kernel based density derivative estimation that reduces the computational complexity from O(MN) to linear O(N +M). We apply the procedure to estimate the optimal bandwidth for kernel density estimation. We demonstrate the speedup achieved on this problem using the ”solve-the-equation plug-in” method, ...

متن کامل

A Bayesian approach to bandwidth selection for multivariate kernel density estimation

Kernel density estimation for multivariate data is an important technique that has a wide range of applications. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal data-driven bandwidth as the dimension of the data increases. ...

متن کامل

Semiparametric Localized Bandwidth Selection in Kernel Density Estimation

Since conventional cross–validation bandwidth selection methods do not work for the case where the data considered are serially dependent, alternative bandwidth selection methods are needed. In recent years, Bayesian based global bandwidth selection methods have been proposed. Our experience shows that the use of a global bandwidth is however less suitable than using a localized bandwidth in ke...

متن کامل

Bandwidth Selection for Multivariate Kernel Density Estimation Using MCMC

Kernel density estimation for multivariate data is an important technique that has a wide range of applications in econometrics and finance. However, it has received significantly less attention than its univariate counterpart. The lower level of interest in multivariate kernel density estimation is mainly due to the increased difficulty in deriving an optimal datadriven bandwidth as the dimens...

متن کامل

Asymptotic Behaviors of Nearest Neighbor Kernel Density Estimator in Left-truncated Data

Kernel density estimators are the basic tools for density estimation in non-parametric statistics.  The k-nearest neighbor kernel estimators represent a special form of kernel density estimators, in  which  the  bandwidth  is varied depending on the location of the sample points. In this paper‎, we  initially introduce the k-nearest neighbor kernel density estimator in the random left-truncatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Psychological methods

دوره 19 3  شماره 

صفحات  -

تاریخ انتشار 2014